SemanticBSDD

Improving the GraphQL, JSON and RDF Representations of buildingSmart Data Dictionary

Vladimir Alexiev, Mihail Radkov, Nataliya Keberle

Objective

  • Highlight the defects in the original GraphQL implementation of bSDD
  • Overview the refactored solution proposed by Ontotext
  • Overview the proposed improvements

bSDD GraphQL Schema: Voyager

bSDD GraphQL Schema: PlantUML

PlantUML is used with soml2puml convertor tool

Original GraphQL: Findings (1/3)

  • Reference entities ReferenceDocument, Country, Unit, Language are disconnected from the rest of the schema
  • Relation entities have only an incoming link but no outgoing link
  • Many entities cannot be queried directly from the Root
  • No backward arrows to get from a lower-level entity back to its “parent” entity
  • A number of parallel arrows. GraphQL schema can use parameters to distinguish between the different uses

Original GraphQL: Findings (2/3)

At the high level of detail:

  • Property and ClassificationProperty are very similar, but there’s no inheritance/relation between them
  • PropertyValue and ClassificationPropertyValue are exactly the same, so can be reduced to one entity

Original GraphQL: Findings (3/3)

Mixture of singular/plural in property names

property/properties, relations, synonyms, countriesOfUse, relatedIfcPropertyNames, etc.

Refactored GraphQL: Improvements

  • All entities are queryable directly from the Root
  • Link deduplication
  • Each link is named the same as target entity
  • Navigation between entities is bidirectional, e.g., Classification hierarchy can be navigated both up and down (parentClassification, childClassification)
  • A query can traverse a Relation entity to get data about the related entity:
    • Classification.relation -> ClassificationRelation.related -> Classification
    • Property.relation -> PropertyRelation.related -> Property
  • A single entity PropertyValue is used by both Property and ClassificationProperty
  • Property names are in singular

Graph i QL

Refactored bSDD: SPARQL endpoint

Suggested Improvements

Presentation

  • Uniform identification for the search
  • Equal data retrieved from different API
  • Improve URL structure and consistency

Uniform Identification for the Search(1/3)

May 2023: IfcCableSegment has another id: https://search.bsdd.buildingsmart.org/Classification/Index/70992

Uniform Identification for the Search(2/3)

IfcCableSegment has also unique URI:

https://identifier.buildingsmart.org/uri/buildingsmart/ifc-4.3/class/IfcCableSegmentCABLESEGMENT

CableSegment entity as displayed at the bSDD web site

Uniform Identification for the Search(3/3)

Non-unique identification violates FAIR Findability principle

F1: (Meta)data are assigned a globally unique and persistent identifier

Equal Data Retrieved from Different API (1/2)

We have compared three representations returned by the bSDD server:

  • JSON from the GraphQL API
    • https://test.bsdd.buildingsmart.org/graphiql/,
  • JSON from the REST (entity) API
    • curl https://identifier.buildingsmart.org/uri/buildingsmart/ <domain>/class|prop/<name> and
  • RDF from the REST (entity) API
    • curl -Haccept:text/turtle \\ https://identifier.buildingsmart.org/uri/buildingsmart/ <domain>/class|prop/<name>

Equal Data Retrieved from Different API (2/2)

We selected entities of each class that have the maximum number of filled fields, and compared the results returned by each API.

The differences are here:

Improve URL Structure and Consistency (1/7)

Recommendations on ontology URI design, including versioning and opaque URIs to maintain evolution and multilingualism inherent to bSDD, are described in Garijo & Poveda-Villalon, 2020.

Almost all bSDD domain URLs now have the same structure: https://identifier.buildingsmart.org/uri/<org>/<domain>-<version>

URIs can be more ``hackable’’, allowing users to navigate the hierarchy by pruning the URI: https://identifier.buildingsmart.org/uri/<org>/<domain>/<version>

Improve URL Structure and Consistency (2/7)

  • In some cases, the <org> is repeated in the <domain> part
  • In some cases, the <org> name doesn’t quite mesh with the domain name, perhaps due to the way bSDD allocates <org> identifiers to bSDD contributors
    • bim-de/DINSPEC91400: the publisher of this spec is DIN (the German standards organization), not the bim-de initiative
    • digibase/volkerwesselsbv: bimregister.nl news from 2018 suggest that digibase is a new company/initaitive within Volker Wessel
    • digibase/nen2699: the publisher of this spec is NEN (the Netherlands standards organization), not the digibase company/initiative
    • digibase/digibasebouwlagen: perhaps the org name digibase should not be repeated as the prefix of the domain bouwlagen (building layers)

Improve URL Structure and Consistency (3/7)

  • Explicate domain versions:

https://identifier.buildingsmart.org/uri/acca/ACCAtest-0.1

can become

https://identifier.buildingsmart.org/uri/acca/ACCAtest/0.1

A new entity DomainVersion can provide linking all versions of a domain to its master Domain entity.

Improve URL Structure and Consistency (4/7)

  • Declare URLs to be ID and use a mandatory field id
    • Most GraphQL implementations call this field simply id, whereas bSDD uses namespaceUri
    • Many nodes do not have their own namespaceUri field, or it is not fully populated

Improve URL Structure and Consistency (5/7)

  • Remove the overlap of Entity Classes with classificationTypes

The key field classificationType specifies the kind of classification.

c classificationType overlaps with entity
29 “DOMAIN” Domain
18 “REFERENCE_DOCUMENT” ReferenceDocument

Examples of unusual classifications:

https://identifier.buildingsmart.org/uri/ATALANE/REX-OBJ-1.0/class/589b06ad-f802-468b-939c-e60436601a7a is a “REFERENCE_DOCUMENT” with name “décret 2011-321 (23/03/2011)”.

Why is it not a ReferenceDocument entity?

Improve URL Stucture and Consistency (6/7)

  • All entities should have URL

All significant classes should have ID, which in the case of RDF data is the node URL.

However, many bSDD classes don’t have such a field:

  • Domain, Property, Classification do have namespaceUri
  • Country, Language, Unit don’t have an ID but have a field (code, isocode) that can be used to make an ID, when prepended with an appropriate prefix.

Improve URL Stucture and Consistency (7/7)

Property and ClassificationProperty are two different classes, but the latter does not have a distinct URL in GraphQL and JSON.

The same URL is overloaded to identify entities of both classes.

ClassificationProperty are thus “second class” entities and are not returned separately by the JSON or RDF entity API, but only as part of the respective Classification

curl https://identifier.buildingsmart.org/uri/buildingsmart/ifc-4.3/class/IfcCableSegmentCABLESEGMENT/ACResistance

{"":["Classification with namespace URI
 'https://identifier.buildingsmart.org/uri/buildingsmart/ifc-4.3/class/IfcCableSegmentCABLESEGMENT/ACResistance'
  not found"]}

Modelling issues

Modelling issues (1/8): Unify Solutions to Model Complex Properties

The key attribute propertyValueKind has values COMPLEX and COMPLEX_LIST used in combination with connectedProperties. These key values are defined for Property and ClassificationProperty

  • However, connectedPropertyCodes is defined only for Property
  • More importantly, these key values are never used
  • connectedProperty is used only on 7 Properties (and not ClassificationProperties)

Instead of using connectedPropertyCodes to describe complex properties, some providers have used classifications with the type COMPOSED_PROPERTY.

Modelling issues (2/8): Improve Modelling of Dynamic Properties

12385 Properties are declared as isDynamic (135250 are not).

However, the field dynamicParameterPropertyCode (used to compute the dynamic property) is always empty, so how can one know which “sub-properties” to use?

Additionally, dynamicParameterPropertyCodes is String, but should be [Property], i.e. an array of Properties .

Modelling issues (3/8): Improve Relations Between Entities

Modelling issues (4/8): Add More Entities

bSDD includes numerous string attributes (codes or URLs) that should be converted to relations (object fields) to improve the connectedness of the bSDD GraphQL graph

is a classification field (String) should be
physicalQuantity (New) PhysicalQuantity
propertySet (New) PropertySet
subdivisionsOfUse (New) [CountrySubdivision
version (New) DomainVersion
replaced/(-ing)ObjectCodes some kind of objects. Currently they are empty

Modelling issues (5/8): Use Class Inheritance

Property and ClassificationProperty: differ in only 5 fields:

  • connectedPropertyCodes (String) and relations (PropertyRelation) belong uniquely to Property
  • isRequired (Boolean), isWritable (Boolean), predefinedValue (String), propertySet (String) and symbol (String) below uniquely to ClassificationProperty.

Since there are no rules on which fields of Property to reuse in ClassificationProperty, the latter type copies most of the fields from the former.

Modelling issues (6/8): Improve Representation of PropertyValues

PropertyValue and ClassificationPropertyValue are structured values with rich fields: code, value, namespaceUri, description, sortNumber, allowing to:

However, most structured values we’ve seen have only code, value.

This has multiple problems:

  • Individual values have no description (description is not filled out)
  • Some values are described in the property definition, intermingling multiple descriptions together
  • The “standard” values NOTKNOWN, OTHER, UNSET are not described at all.
  • Values have no namespaceUri, precluding unique identification.

Modelling issues (7/8): Improve representation of predefinedValue

allowedValues (and its deprecated variant possibleValues) store structured values (ClassificationPropertyValue)

However, their “sibling” property predefinedValue holds just a String, which means that even in the future, predefinedValue cannot be an enumeration value identified globally with a URL.

Modelling issues (8/8): Improve Multilingual Support

bSDD is advertised as a multilingual dictionary. In the GraphQL API, one can specify a desired language when fetching classifications and properties: However, most domains are present in one language only (unilingual).

Data quality

We encountered various data quality issues:

  • leading, trailing, consecutive whitespace
  • improve physical quantities and units
  • no rules on missing data
  • Unicode problems
  • unresolved HTML entities
  • bad clasification relations (broken links)

Implementing Improvements

We implemented a lot (but not all) of the improvements suggested above by using the following process:

  • Fetching bSDD data as JSON
  • Converting it to RDF using SPARQL Anything
  • Loading it to GraphDB
  • Refactoring the RDF using SPARQL Update

The results are available at the endpoint

Conclusions and Future Work

Here are further ideas for improvement:

  • improvement of bSDD ontology
  • implement more radical data model refactoring to convert “strings” (countries, reference documents, etc.) into “things”
  • link bSDD units of measure to QUDT ontology
  • perform deeper data quality analysis using SHACL shapes generation and validation provided by Ontotext Platform Semantic Objects
  • address and resolve more data quality issues, including
    • seeking correlation between dimension vectors, units of measure and physical quantity,
    • parsing out enumeration values from Property/ClassificationProperty descriptions and creation of corresponding PropertyValue lists
  • make more graph visualizations
  • obtain more interesting statistics using SPARQL

Acknowledgements

Funding: ACCORD project, Horizon Europe, grant #101056973

Data: buildingSMART Data Dictionary (bSI credits: Leon van Berlo, Artur Tomczak, Erik Baars)

Powered by: